Michael Collyer, Chatham University
16 October, 2019
\[\huge \mathbf{Y}=\mathbf{X}\mathbf{\beta } +\mathbf{E}\]
| Component | Dimension | Description |
|---|---|---|
| \(\mathbf{Y}\) | \(n \times p\) | Data matrix with \(n\) observations for \(p\) variables |
| \(\mathbf{X}\) | \(n \times k\) | Linear model design matrix with \(n\) observations for \(k\) parameters |
| \(\mathbf{\beta}\) | \(k \times p\) | Matrix of coefficients expressing change in values for the \(k\) model parameters for each of \(p\) variables |
| \(\mathbf{E}\) | \(n \times p\) | Matrix of residuals (error) for \(n\) observations for \(p\) variables |
Like any area of statistics, the coefficients for a linear model (which has parameters for variable change associated with some hypothetical process) are generally unknown but exist in a population, and are, therefore, estimated from a sample. We can solve this algebraically as the solution that would be true if there was no error in the model (i.e., \(\mathbf{E}\) = 0). The goal is to solve for \(\mathbf{\beta}\). We will accomplish this goal by first recognizing some important properties of general linear models (LM).
The goal of LM is to find a solution for which the estimated value of a data vector, given a vector of predictors, is expected to be true,
\[\mathbb{E}\left( \mathbf{y}^T | \mathbf{x}^T\right) = \mathbf{\mu}_{ \mathbf{y} | \mathbf{x}}\]
The term, Least Squares (LS), references the fact that coefficients estimated from the model design, \(\mathbf{X}\), will minimize the sum squared error between observed data and expected values.
LS calculation of coefficients starts with first standardizing the design matrix by the expected covariance of the data, \(\mathbf{\Omega}\). This is done via
\[\mathbf{\Omega}^{-1}\mathbf{X}\]
If there is no expected covariance structure, then \(\mathbf{\Omega}\) is \(\mathbf{I}\), and because \(\mathbf{I}^{-1} = \mathbf{I}\), no change in \(\mathbf{X}\) would be required. Note that \(\mathbf{\Omega}\) is an \(n \times n\) symmetric matrix.
Solving for the coefficients via LS: \[\mathbf{Y}=\mathbf{X}\mathbf{\beta } \] \[\left(\mathbf{\Omega}^{-1} \mathbf{X} \right)^T \mathbf{Y} = \left(\mathbf{\Omega}^{-1} \mathbf{X} \right)^T\mathbf{X}\mathbf{\beta } \] \[\mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{Y} =\mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{X}\mathbf{\beta } \]
\[\left( \mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{Y} = \left( \mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{X}\mathbf{\beta } = \mathbf{I\beta}\]
\[\hat{\mathbf{\beta }} = \left( \mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{Y} \]
The ^ reminds us that this is a matrix of estimated values. This is the generalized least squares (GLS) solution. If \(\mathbf{\Omega} = \mathbf{I}\), then the solution simplifies to
\[\hat{\mathbf{\beta }} = \left( \mathbf{X}^T \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{Y} \]
which is the ordinary least squares (OLS) solution.
One might notice that because of the various matrix inversions, this can be computationally intensive and if \(\mathbf{\Omega} = \mathbf{I}\), it would be easier to use the OLS solution. It is also possible to use the OLS for GLS calculations through an algebraic short-cut.
The simpler method of coefficient estimation is found through a transformation (projection) matrix, to transform the data and model design prior to the algebraic steps.
Step 1: Perform eigen analysis on \(\mathbf{\Omega}\) and obtain a set of eigenvectors, \(\mathbf{U}\) and eigenvalues, \(\mathbf{W}\).
Step 2: Generate an \(n \times n\) transformation matrix as
\[\mathbf{T} = \left( \mathbf{UW}^{-1/2} \mathbf{U}^T \right)^{-1}\] Step 3: Project both data and model design matrix onto \(\mathbf{T}\).
\[\mathbf{\tilde{Y}} = \mathbf{TY}\]
\[\mathbf{\tilde{X}} = \mathbf{TX}\]
Step 4: Perform OLS estimation of coefficients using transformed values
\[\hat{\mathbf{\beta }}=\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}}\right )\]
The important condition is that
\[\hat{\mathbf{\beta }} = \left( \mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^T \mathbf{\Omega}^{-1} \mathbf{Y} = \left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}}\right ) \] However, these two methods are not the same in terms of exchangeable units under the null hypothesis for resampling procedures, a topic we address later.
Coefficients are the tools for making predictions from the linear model, namely
\[\mathbf{\hat{y}} = \mathbf{x\hat{\beta}}\] is an estimated vector of values for variables of, \(\mathbf{Y}\), given the predictors in \(\mathbf{x}\).
If performed for all representations in \(\mathbf{X}\), we get a matrix of fitted values
\[\mathbf{\hat{Y}} = \mathbf{X\hat{\beta}}\] and expanding this equation
\[\hat{\mathbf{Y}}=\mathbf{X}\hat{\mathbf{\beta }}=\mathbf{X}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1} \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}} = \mathbf{H\tilde{Y}}\]
\[\hat{\mathbf{Y}}=\mathbf{X}\hat{\mathbf{\beta }}=\mathbf{X}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1} \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}} = \mathbf{H\tilde{Y}}\] The matrix, \(\mathbf{H}\) is an \(n\times n\) projection matrix. Notice that this is a projection matrix for transformed data. If we prefer to work with the original data, we can let \(\mathbf{H}' = \mathbf{HT}\), be a projection matrix that accounts for both the linear model and transformation, then \(\hat{\mathbf{Y}} = \mathbf{H}'\mathbf{Y}\). This means that a linear model can be thought of as a means to project data into a tangent space that best describes a hypothetical pattern (associated with a process), including accounting for a hypothetical covariance structure.
\(\mathbf{H}^{'}\) is positive-definite and idempotent. If it is symmetric, then it produces an orthogonal projection (typical with OLS). If it is not symmetric, it produces an oblique projection (typical with GLS).
As a Reminder, if no covariance structure is used, \(\mathbf{\Omega} = \mathbf{I}\), thus, \(\mathbf{H}^{'} = \mathbf{H}\), and \(\mathbf{H}\) is an orthogonal projection matrix.
The unexplained component of the data is the matrix of residuals, \(\mathbf{E}\), found as \[\mathbf{E} = \mathbf{Y} - \hat{\mathbf{Y}}=\mathbf{Y} - \mathbf{H}'\mathbf{Y} =\left(\mathbf{I}-\mathbf{H}'\right)\mathbf{Y}\] \(\left(\mathbf{I}-\mathbf{H}'\right)\) is also a projection matrix, meaning the residuals are a projection of \(\mathbf{Y}\) into a different tangent space, where attributes of the points can be examined, especially with reference to the fitted value projections.
Therefore, the general linear model finds projections of data as fitted values and residuals, \(\mathbf{H}'\mathbf{Y}\) and \(\left(\mathbf{I}-\mathbf{H}'\right)\mathbf{Y}\), respectively. We will learn how and why this is important in just about every subsequent lecture.
\(\left(\mathbf{I}-\mathbf{H}'\right)\) is positive-definite and idempotent. If it is symmetric, then it produces an orthogonal projection (typical with OLS). If it is not symmetric, it produces an oblique projection (typical with GLS).
As a Reminder, if no covariance structure is used, \(\mathbf{\Omega} = \mathbf{I}\), thus, \(\left(\mathbf{I}-\mathbf{H}'\right) = \left(\mathbf{I}-\mathbf{H}\right)\), and \(\left(\mathbf{I}-\mathbf{H}\right)\) is an orthogonal projection matrix.
Projection of data into tangent spaces of their model predictions (fit) and error (dispersion of residuals) allows one to determine if there are latent patterns in the data, as predicted by the parameters in the design matrix, and hopefully no pattern in the residuals (if assumptions are upheld). The best way to do this is via PCA on covariance matrices. PCA finds the axes of greatest variation in the projections, which allows a low-dimensional visual interpretation of even high-dimensional spaces.
To calculate a model (residual) covariance matrix, first calculate the sums of squares and cross-products (\(SSCP\)) via a cross-product of the residuals.
Residuals \(SSCP\)
\[\mathbf{S} = \left(\Omega^{-1}\mathbf{E}\right)^T \Omega^{-1} \mathbf{E}\]
And the covariance matrices is
\[\mathbf{\hat{\Sigma}}_{\mathbf{E}} = \left( n-1\right)^{-1} \mathbf{S}\]
Note that any covariance structure must be accounted for in order for the residual \(SSCP\) to be independent.
\[\mathbf{\hat{Y}} = \mathbf{H}\mathbf{\tilde{Y}}\]
\[\mathbf{\tilde{E}} =\left(\mathbf{I}-\mathbf{H}\right)\mathbf{\tilde{Y}}\]
In words, fitted values are obtained from a model projection of transformed data (untransformed if there is no model covariance). Modified residuals are obtained also from a projection of transformed data (unmodified if there is no model covariance).
\[\mathbf{\Delta\hat{Y}} = \left(\mathbf{H}_2 - \mathbf{H}_1 \right)\mathbf{\tilde{Y}}\]
or
\[\mathbf{\Delta\tilde{E}} = \left(\left(\mathbf{I}-\mathbf{H}_1\right) - \left(\mathbf{I}-\mathbf{H}_2 \right) \right) \ \mathbf{\tilde{Y}} = \left(\mathbf{H}_2 - \mathbf{H}_1 \right)\mathbf{\tilde{Y}}\]
In other words, the difference in model projections is a projection of a data, defined as \(\mathbf{\Delta{H}}\). (How does changing the model affect our ability to estimate the data, \(\mathbf{Y}\)?)
\[\mathbf{\Delta{H}}\mathbf{\tilde{Y}}\]
Other than to consider total dispersion in a data set, a linear model analysis always considers at least two models.
Much of life is like this - the absolute definition of a state is difficult or meaningless, but the relative difference in states provides meaning to both. (For example, if today is 3 degrees warmer than yesterday, we know something about the weather both today and yesterday, but finding an absolute meaning of temperature is not helpful.) Whether a linear model is a good or poor model for how data are dispersed requires a comparison (especially to a null model).
Developing an intuition for how design matrices are constructed helps researchers better understand the analytical methods they are using.
There are different ways to construct model design matrices. We will use the most traditional way, which is how R works.
The best way to illustrate this is with an example.
## Loading required package: RRPP
## Loading required package: rgl
Let’s make up a fictional response, just for illustrative purposes:
## [1] 0 2 6 4 3 3 4 7
Factors (categorical variables) require multiple variables for the various levels of the factor.
## (Intercept) GroupM.Marsh GroupF.Sinkhole GroupM.Sinkhole
## 15 1 0 0 0
## 16 1 0 0 0
## 17 1 1 0 0
## 18 1 1 0 0
## 40 1 0 1 0
## 41 1 0 1 0
## 42 1 0 0 1
## 43 1 0 0 1
## attr(,"assign")
## [1] 0 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$Group
## [1] "contr.treatment"
Let’s revisit this description from before (next slide)
Factors (categorical variables) require multiple variables for the various levels of the factor.
## (Intercept)
## 15 1
## 16 1
## 17 1
## 18 1
## 40 1
## 41 1
## 42 1
## 43 1
## attr(,"assign")
## [1] 0
## (Intercept) GroupM.Marsh GroupF.Sinkhole GroupM.Sinkhole
## 15 1 0 0 0
## 16 1 0 0 0
## 17 1 1 0 0
## 18 1 1 0 0
## 40 1 0 1 0
## 41 1 0 1 0
## 42 1 0 0 1
## 43 1 0 0 1
## attr(,"assign")
## [1] 0 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$Group
## [1] "contr.treatment"
The point is that means can be described as vectors from the origin or from a first mean. Let’s see how this works with the R example
X0 <- model.matrix(~ 1, data = df)
Xg <- model.matrix(~ Group + 1, data = df)
B0 <- solve(crossprod(X0)) %*% crossprod(X0, y) # find the overall mean of y
Bg <- solve(crossprod(Xg)) %*% crossprod(Xg, y) # find group means
B0 # the overall mean## [,1]
## (Intercept) 3.625
## [1] 3.625
## [,1]
## (Intercept) 1.0
## GroupM.Marsh 4.0
## GroupF.Sinkhole 2.0
## GroupM.Sinkhole 4.5
The way to interpret these coefficients is to change an intercept equal to the first value by the amount indicated by the subsequent coefficients.
Note that in R, one can do this
## df$Group: F.Marsh
## [1] 1
## --------------------------------------------------------
## df$Group: M.Marsh
## [1] 5
## --------------------------------------------------------
## df$Group: F.Sinkhole
## [1] 3
## --------------------------------------------------------
## df$Group: M.Sinkhole
## [1] 5.5
Remember that fitted values are found as \(\mathbf{\hat{Y}} = \mathbf{X\hat{\beta}}\). Remember also run amok computational demon!
## df$Group: F.Marsh
## [1] 1
## --------------------------------------------------------
## df$Group: M.Marsh
## [1] 5
## --------------------------------------------------------
## df$Group: F.Sinkhole
## [1] 3
## --------------------------------------------------------
## df$Group: M.Sinkhole
## [1] 5.5
Analysis of variance is used for determining the significance of group differences (whether group means are a better estimate of shape than the grand mean). For shape data, using randomization of residuals in a permutation procedure (RRPP) is the best way to go (Collyer et al. 2015; Adams & Collyer 2016, 2018)
The equation for RRPP:
\[\mathbf{\mathcal{Y}} = \mathbf{\hat{Y}} + \mathbf{E}_{\left[s,\right]}^*\]
In words, random pseudo-values are generated by holding the transformed fitted values of a (reduced) model constant and randomizing only the transformed residuals. The subscript of the transformed residuals indicates that the rows of the matrix are shuffled, according to \(s\), which is a randomized form of the sequence, \(1,2,3,...,n\).
Returning to our previous example
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Group 3 25.375 8.4583 0.74908 3.9804 1.2092 0.0885 .
## Residuals 4 8.500 2.1250 0.25092
## Total 7 33.875
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: procD.lm(f1 = y ~ Group, data = df, print.progress = FALSE)
pupfish$Group <- interaction(Pupfish$Sex, Pupfish$Pop)
fit <- procD.lm(coords ~ Group, data = pupfish, print.progress = FALSE)
anova(fit)##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Group 3 0.028363 0.0094543 0.50349 16.901 6.8452 0.001 **
## Residuals 50 0.027970 0.0005594 0.49651
## Total 53 0.056333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: procD.lm(f1 = coords ~ Group, data = pupfish, print.progress = FALSE)
plot(fit, type = "PC", pch = 21, bg = pupfish$Group)
legend("topright", levels(pupfish$Group), pch = 21, pt.bg = 1:4)
## (Intercept) SexM PopSinkhole SexM:PopSinkhole
## 15 1 0 0 0
## 16 1 0 0 0
## 17 1 1 0 0
## 18 1 1 0 0
## 40 1 0 1 0
## 41 1 0 1 0
## 42 1 1 1 1
## 43 1 1 1 1
## attr(,"assign")
## [1] 0 1 2 3
## attr(,"contrasts")
## attr(,"contrasts")$Sex
## [1] "contr.treatment"
##
## attr(,"contrasts")$Pop
## [1] "contr.treatment"
## [,1]
## (Intercept) 1.0
## SexM 4.0
## PopSinkhole 2.0
## SexM:PopSinkhole -1.5
Note the redundancy
## [,1]
## 15 1.0
## 16 1.0
## 17 5.0
## 18 5.0
## 40 3.0
## 41 3.0
## 42 5.5
## 43 5.5
## [,1]
## 15 1.0
## 16 1.0
## 17 5.0
## 18 5.0
## 40 3.0
## 41 3.0
## 42 5.5
## 43 5.5
So why bother with a more complicated model design matrix?
Because these do not make sense unless this factorial design was used
## (Intercept) SexM
## 15 1 0
## 16 1 0
## 17 1 1
## 18 1 1
## 40 1 0
## 41 1 0
## 42 1 1
## 43 1 1
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$Sex
## [1] "contr.treatment"
## (Intercept) SexM PopSinkhole
## 15 1 0 0
## 16 1 0 0
## 17 1 1 0
## 18 1 1 0
## 40 1 0 1
## 41 1 0 1
## 42 1 1 1
## 43 1 1 1
## attr(,"assign")
## [1] 0 1 2
## attr(,"contrasts")
## attr(,"contrasts")$Sex
## [1] "contr.treatment"
##
## attr(,"contrasts")$Pop
## [1] "contr.treatment"
So why bother with a more complicated model design matrix?
The two models before were sub-models of this full one
## (Intercept) SexM PopSinkhole SexM:PopSinkhole
## 15 1 0 0 0
## 16 1 0 0 0
## 17 1 1 0 0
## 18 1 1 0 0
## 40 1 0 1 0
## 41 1 0 1 0
## 42 1 1 1 1
## 43 1 1 1 1
## attr(,"assign")
## [1] 0 1 2 3
## attr(,"contrasts")
## attr(,"contrasts")$Sex
## [1] "contr.treatment"
##
## attr(,"contrasts")$Pop
## [1] "contr.treatment"
With a factorial model, \(SS\) can be partitioned (with type I \(SS\)) for more effects, and greater inference is possible.
fitg <- procD.lm(y ~ Group, data = df, iter = 999, SS.type = "I", print.progress = FALSE)
fitf <- procD.lm(y ~ Sex * Pop, data = df, iter = 999, SS.type = "I", print.progress = FALSE)
anova(fitg)##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Group 3 25.375 8.4583 0.74908 3.9804 1.2092 0.0885 .
## Residuals 4 8.500 2.1250 0.25092
## Total 7 33.875
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: procD.lm(f1 = y ~ Group, iter = 999, SS.type = "I", data = df,
## print.progress = FALSE)
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Sex 1 21.125 21.125 0.62362 9.9412 1.7018 0.0305 *
## Pop 1 3.125 3.125 0.09225 1.4706 0.2450
## Sex:Pop 1 1.125 1.125 0.03321 0.5294 0.4985
## Residuals 4 8.500 2.125 0.25092
## Total 7 33.875
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: procD.lm(f1 = y ~ Sex * Pop, iter = 999, SS.type = "I", data = df,
## print.progress = FALSE)
fitPup <- procD.lm(coords ~ Sex * Pop, data = pupfish, iter = 999, SS.type = "I", print.progress = FALSE)
anova(fitPup)##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Sex 1 0.015780 0.0157802 0.28012 28.209 5.4624 0.001 **
## Pop 1 0.009129 0.0091294 0.16206 16.320 5.1098 0.001 **
## Sex:Pop 1 0.003453 0.0034532 0.06130 6.173 3.6978 0.001 **
## Residuals 50 0.027970 0.0005594 0.49651
## Total 53 0.056333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: procD.lm(f1 = coords ~ Sex * Pop, iter = 999, SS.type = "I",
## data = pupfish, print.progress = FALSE)
plot(fitPup, type = "PC", pch = 21, bg = pupfish$Group)
legend("topright", levels(pupfish$Group), pch = 21, pt.bg = 1:4)One last comment about the factorial model design,
It might be apparent with the previous example, or any other example with which one might familiar, that there is often more than a null model and a full model. There might be different effects that could be considered for different sets of parameters. In the pupfish example, we might ask whether \(Sex\) or \(Population\) contribute more to the dispersion in means. We might also wonder if the \(Sex\) effect varies between populations.
Not to worry! The model comparison approach can be done in sequence for different model effects. There are only a few little twists with which to be concerned.
In the initial pupfish example, the null and reduced models were the same; the full model had parameters to estimate four group means. We are going to use the same model now to consider partial effects associated with the dispersion of the four means.
In the pupfish example, we could have the following set of model comparisons to estimate different effects
| Effect | Reduced | Full | Purpose |
|---|---|---|---|
| Sex | Null | Null + Sex | Determine if Sex mean difference is “significant” compared to an overall mean |
| Pop | Null + Sex | Null + Sex + Pop | Determine if population differences are significant, after accounting for general difference between males and females |
| Sex:Pop | Null + Sex + Pop | Null + Sex + Pop + Sex:Pop | Determine if sexual dimorphism varies between populations, after accounting for general difference in sex and population. |
| Full model | Null | Null + Sex + Pop + Sex:Pop | Ascertain if the model is significant (if not, the other parts should not be considered significant either) |
Estimating \(SS\) between models for the first three model comparisons is called Sequential (Type I) \(SS\) estimation. We start with a null model and build up to the fullest model.
In the pupfish example, we could have the following set of model comparisons to estimate different effects
| Effect | Reduced | Full | Purpose |
|---|---|---|---|
| Sex | Null + Pop | Null + Pop + Sex | Determine if Sex mean difference is “significant”, after accounting for population mean difference |
| Pop | Null + Sex | Null + Sex + Pop | Determine if population differences are significant, after accounting for general difference between males and females |
| Sex:Pop | Null + Sex + Pop | Null + Sex + Pop + Sex:Pop | Determine if sexual dimorphism varies between populations, after accounting for general difference in sex and population. |
| Full model | Null | Null + Sex + Pop + Sex:Pop | Ascertain if the model is signficant (if not, the other parts should not be considered significant either) |
Estimating \(SS\) between models for the first three model comparisons is called Conditional (Type II) \(SS\) estimation. We analyze effects with respect to other effects, but do not include interactions with targeted effects. For example, with three factors, A, B, and C. The effect A would have a reduced model of B, C, and B:C. The full model would add A to these. The effect A:B, would have a reduced model of A, B, C, and B:C, and the full model would add A:B to these.
In the pupfish example, we could have the following set of model comparisons to estimate different effects
| Effect | Reduced | Full | Purpose |
|---|---|---|---|
| Sex | Null + Pop + Pop:Sex | Null + Pop + Sex + Pop:Sex | Determine if Sex mean difference is “significant”, after accounting for all other effects |
| Pop | Null + Sex + Pop:Sex | Null + Sex + Pop + Pop:Sex | Determine if population differences are significant, after accounting for all other effects. |
| Sex:Pop | Null + Sex + Pop | Null + Sex + Pop + Sex:Pop | Determine if sexual dimorphism varies between populations, after accounting for general difference in sex and population. |
| Full model | Null | Null + Sex + Pop + Sex:Pop | Ascertain if the model is signficant (if not, the other parts should not be considered significant either) |
Estimating \(SS\) between models for the first three model comparisons is called Marginal (Type III) \(SS\) estimation. Whereas Type I \(SS\) evaluates the importance of effects through inclusion, Type III \(SS\) evaluates the importance of effects through exclusion.
fit1 <- procD.lm(coords ~ Pop * Sex, data = pupfish, iter = 999, SS.type = "I", print.progress = FALSE)
fit2 <- procD.lm(coords ~ Pop * Sex, data = pupfish, iter = 999, SS.type = "II", print.progress = FALSE)
fit3 <- procD.lm(coords ~ Pop * Sex, data = pupfish, iter = 999, SS.type = "III", print.progress = FALSE)
anova(fit1)##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type I
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Pop 1 0.008993 0.0089927 0.15964 16.076 4.5610 0.001 **
## Sex 1 0.015917 0.0159169 0.28255 28.453 5.4094 0.001 **
## Pop:Sex 1 0.003453 0.0034532 0.06130 6.173 3.6978 0.001 **
## Residuals 50 0.027970 0.0005594 0.49651
## Total 53 0.056333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: procD.lm(f1 = coords ~ Pop * Sex, iter = 999, SS.type = "I",
## data = pupfish, print.progress = FALSE)
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type II
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Pop 1 0.009129 0.0091294 0.16206 16.320 5.1098 0.001 **
## Sex 1 0.015917 0.0159169 0.28255 28.453 5.4094 0.001 **
## Pop:Sex 1 0.003453 0.0034532 0.06130 6.173 3.6978 0.001 **
## Residuals 50 0.027970 0.0005594 0.49651
## Total 53 0.056333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: procD.lm(f1 = coords ~ Pop * Sex, iter = 999, SS.type = "II",
## data = pupfish, print.progress = FALSE)
##
## Analysis of Variance, using Residual Randomization
## Permutation procedure: Randomization of null model residuals
## Number of permutations: 1000
## Estimation method: Ordinary Least Squares
## Sums of Squares and Cross-products: Type III
## Effect sizes (Z) based on F distributions
##
## Df SS MS Rsq F Z Pr(>F)
## Pop 1 0.007479 0.0074790 0.13276 13.370 4.8507 0.001 **
## Sex 1 0.015253 0.0152534 0.27077 27.267 5.3985 0.001 **
## Pop:Sex 1 0.003453 0.0034532 0.06130 6.173 3.6978 0.001 **
## Residuals 50 0.027970 0.0005594 0.49651
## Total 53 0.056333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call: procD.lm(f1 = coords ~ Pop * Sex, iter = 999, SS.type = "III",
## data = pupfish, print.progress = FALSE)
Let’s return to the pupfish body shape example.
An astute observer might start to wonder which pupfish means are different form each other?
One thing of which to be aware, models with interactions provide so much more opportunity than simpler single-factor models. Recall that these models produce the same means:
fitg <- procD.lm(coords ~ Group, data = pupfish, print.progress = FALSE, iter = 999)
fitf <- procD.lm(coords ~ Pop * Sex, data = pupfish, print.progress = FALSE, iter = 999)
fitg##
## Linear Model fit with lm.rrpp
##
## Number of observations: 54
## Number of dependent variables: 112
## Data space dimensions: 53
## Sums of Squares and Cross-products: Type I
## Number of permutations: 1000
## Call: procD.lm(f1 = coords ~ Group, iter = 999, data = pupfish, print.progress = FALSE)
##
## Linear Model fit with lm.rrpp
##
## Number of observations: 54
## Number of dependent variables: 112
## Data space dimensions: 53
## Sums of Squares and Cross-products: Type I
## Number of permutations: 1000
## Call: procD.lm(f1 = coords ~ Pop * Sex, iter = 999, data = pupfish,
## print.progress = FALSE)
Let’s see, however, how pairwise hypothesis tests differ between them.
There are two statistics for pairwise comparisons that might be of interest. Recall that any estimated value is a vector. A vector has two attributes: length and direction. With two or more vectors, these attributes are related. Anyone who has studied trigonometry can attest that two vectors originating from the same point (origin) describe a triangle. The vectors are orientated with an angle between them, and the vector between vector endpoints is also a vector with a direction and length (distance). The following figure illustrates this.
The attributes, vector length (distance, \(d\)) and angle (\(\theta\)), can function as test statistics. These statistics have an expected value of 0 under a null hypothesis. (Vector correlation can also be a test statistic, but it has an expected value of 1 for parallel vectors, which is a bit awkward.)
Here is the beautiful thing! I mean this is really, really awesome!
When performing RRPP, we have thousands of random outcomes of coefficients (vectors). These might represent different slopes or ways to estimate vectors between means. Therefore, we inherently have distributions of these geometric statistics. There is no need for a post-hoc, manufactured test! We can simply generate in situ distributions from the resampling experiment!
##
## Pairwise comparisons
##
## Groups: F.Marsh M.Marsh F.Sinkhole M.Sinkhole
##
## RRPP: 1000 permutations
##
## Pairwise comparisons
##
## Groups: F.Marsh M.Marsh F.Sinkhole M.Sinkhole
##
## RRPP: 1000 permutations
For any pairwise test, there is a null model. Let’s see the null models for these pairwise tests:
## Reduced Full
## Group 1 Group <- Null/Full inherent in pairwise
## Reduced Full
## Pop 1 Pop
## Sex Pop Pop + Sex
## Pop:Sex Pop + Sex Pop + Sex + Pop:Sex <- Null/Full inherent in pairwise
Now we have choices! For example,
##
## Pairwise comparisons
##
## Groups: F.Marsh M.Marsh F.Sinkhole M.Sinkhole
##
## RRPP: 1000 permutations
##
## LS means:
## Vectors hidden (use show.vectors = TRUE to view)
##
## Pairwise distances between means, plus statistics
## d UCL (95%) Z Pr > d
## F.Marsh:M.Marsh 0.04611590 0.01901786 9.058460 0.001
## F.Marsh:F.Sinkhole 0.03302552 0.01902670 5.549687 0.001
## F.Marsh:M.Sinkhole 0.03881514 0.01898659 7.176668 0.001
## M.Marsh:F.Sinkhole 0.04605211 0.02049762 7.859520 0.001
## M.Marsh:M.Sinkhole 0.02802087 0.01932013 3.987375 0.004
## F.Sinkhole:M.Sinkhole 0.02568508 0.02013263 3.311133 0.008
Now we have choices! For example,
##
## Pairwise comparisons
##
## Groups: F.Marsh M.Marsh F.Sinkhole M.Sinkhole
##
## RRPP: 1000 permutations
##
## LS means:
## Vectors hidden (use show.vectors = TRUE to view)
##
## Pairwise distances between means, plus statistics
## d UCL (95%) Z Pr > d
## F.Marsh:M.Marsh 0.04611590 0.04297378 2.3333663 0.007
## F.Marsh:F.Sinkhole 0.03302552 0.03312516 1.6978761 0.055
## F.Marsh:M.Sinkhole 0.03881514 0.04633821 -0.5473194 0.699
## M.Marsh:F.Sinkhole 0.04605211 0.05506197 -0.2592230 0.597
## M.Marsh:M.Sinkhole 0.02802087 0.03343901 0.1293830 0.420
## F.Sinkhole:M.Sinkhole 0.02568508 0.04364031 -2.1104002 0.993
Note that in this example, we used test.type = "dist" because it makes sense to look at the differences among means in terms of how spread out they are in the data space. Also notice the similarities and differences between the two tests:
The latter test makes sense, if we look again at the PC plot
plot(fitf, type = "PC", pch = 21, bg = pupfish$Group)
legend("topright", levels(pupfish$Group), pch = 21, pt.bg = 1:4)The first test tells use means are different. The second test tells use that despite population differences in shape and despite sexual dimorphism, in general, the sexual dimorphism in the Marsh population is greater than the sexual dimorphism in the sinkhole population.
What about comparing slopes among groups that might have different patterns of covariation?
Pairwise statistics are geometric attributes
More on pairwise approaches in Trajectory Analysis and Allometry